Skip to content

[OpenTelemetry] Improve HistogramExplicitBounds#7165

Merged
martincostello merged 13 commits into
open-telemetry:mainfrom
martincostello:gh-3428
Jun 9, 2026
Merged

[OpenTelemetry] Improve HistogramExplicitBounds#7165
martincostello merged 13 commits into
open-telemetry:mainfrom
martincostello:gh-3428

Conversation

@martincostello

@martincostello martincostello commented Apr 25, 2026

Copy link
Copy Markdown
Member

Fixes #3428

Changes

Apply recommendations from #3428:

  • Use SIMD vectorization to find buckets.
  • Use radix sort.
  • Apply cap to histogram explicit bounds (10M).

Written primarily by Copilot through an iterative approach.

Final benchmark results are shown below: #7165 (comment)

I haven't benchmarked the ARM64 SIMD branch as I don't have the hardware for that locally.

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

Apply recommendations from open-telemetry#3428:

- Use SIMD vectorization to find buckets.
- Use radix sort.
- Apply cap to histogram explicit bounds (10M).

Written primarily by Copilot through an iterative approach.
@github-actions github-actions Bot added pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package perf Performance related labels Apr 25, 2026
@codecov

codecov Bot commented Apr 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.76271% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.75%. Comparing base (e4762a0) to head (6e87e47).

Files with missing lines Patch % Lines
...c/OpenTelemetry/Metrics/HistogramExplicitBounds.cs 95.28% 5 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7165      +/-   ##
==========================================
- Coverage   89.82%   89.75%   -0.08%     
==========================================
  Files         276      276              
  Lines       14738    14822      +84     
==========================================
+ Hits        13238    13303      +65     
- Misses       1500     1519      +19     
Flag Coverage Δ
unittests-Project-Experimental 89.63% <95.76%> (-0.10%) ⬇️
unittests-Project-Stable 89.66% <95.76%> (-0.07%) ⬇️
unittests-Solution 89.46% <95.76%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/OpenTelemetry/Metrics/MetricStreamIdentity.cs 87.56% <100.00%> (+0.19%) ⬆️
...trics/View/ExplicitBucketHistogramConfiguration.cs 89.47% <100.00%> (+6.14%) ⬆️
...c/OpenTelemetry/Metrics/HistogramExplicitBounds.cs 95.68% <95.28%> (-4.32%) ⬇️

... and 6 files with indirect coverage changes

Add missing patch coverage.
Fix CS8604 warning.
@martincostello

This comment was marked as outdated.

Add note for histogram boundary upper limit.
@martincostello martincostello marked this pull request as ready for review April 25, 2026 14:55
@martincostello martincostello requested a review from a team as a code owner April 25, 2026 14:55
Copilot AI review requested due to automatic review settings April 25, 2026 14:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves histogram explicit-bound bucket lookup performance in the metrics pipeline, and adds a hard cap on the number of explicit boundaries to prevent pathological allocations/processing.

Changes:

  • Replaces the previous large-bound lookup strategy with a radix-partitioned search + SIMD-accelerated linear scan fallback.
  • Adds a 10,000,000 max explicit boundary count check for both view configuration and instrument advice.
  • Adds unit tests, fuzz/property tests, and benchmarks; updates changelog and solution/project wiring.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/OpenTelemetry.Tests/Metrics/MetricViewTests.cs Adds unit test validating boundary-count limit exception details.
test/OpenTelemetry.Tests/Metrics/AggregatorTests.cs Adds coverage for large-bound lookup edge cases (mixed signs, infinities, NaN, degenerate NaN bounds) and display-bounds cleanup.
test/OpenTelemetry.FuzzTests/OpenTelemetry.FuzzTests.csproj New fuzz test project for OpenTelemetry core metrics.
test/OpenTelemetry.FuzzTests/Metrics/HistogramExplicitBoundsFuzzTests.cs Adds FsCheck property tests comparing optimized path vs scalar baseline.
test/Benchmarks/Metrics/HistogramExplicitBoundsBenchmarks.cs Adds BenchmarkDotNet benchmarks for bucket lookup across sizes/layouts.
src/OpenTelemetry/OpenTelemetry.csproj Grants InternalsVisibleTo for the new OpenTelemetry.FuzzTests project.
src/OpenTelemetry/Metrics/View/ExplicitBucketHistogramConfiguration.cs Introduces MaxBoundaryCount and enforces it when setting Boundaries.
src/OpenTelemetry/Metrics/MetricStreamIdentity.cs Enforces MaxBoundaryCount for instrument advice-provided boundaries.
src/OpenTelemetry/Metrics/HistogramExplicitBounds.cs Implements radix-partitioned + SIMD-assisted bucket index lookup and infinity cleanup.
src/OpenTelemetry/CHANGELOG.md Documents the explicit-boundary cap as a breaking change.
OpenTelemetry.slnx Adds the new fuzz test project to the solution.
Comments suppressed due to low confidence (1)

src/OpenTelemetry/Metrics/View/ExplicitBucketHistogramConfiguration.cs:74

  • IsSortedAndDistinct currently treats double.NaN as valid because all comparisons with NaN return false, so arrays containing NaN can pass validation even though they are not meaningfully “ascending order with distinct values”. Consider explicitly rejecting NaN (and possibly non-finite values) in this check so Boundaries validation matches the documented requirements and downstream bucketing assumptions.
        for (var i = 1; i < values.Length; i++)
        {
            if (values[i] <= values[i - 1])
            {
                return false;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/OpenTelemetry/Metrics/MetricStreamIdentity.cs Outdated
Comment thread src/OpenTelemetry/Metrics/MetricStreamIdentity.cs Outdated
@martincostello martincostello added the keep-open Prevents issues and pull requests being closed as stale label May 7, 2026
@Kielek

Kielek commented May 19, 2026

Copy link
Copy Markdown
Member

@cijothomas, could you please check this PR? Especially adding boundaries?

@Kielek Kielek requested a review from cijothomas May 19, 2026 11:10

@Kielek Kielek left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some direct feedback while reviewing by Codex, in general LGTM.

It will be great to see Cijo final review, but I will not block this.

@martincostello, I think that you need to solve problems with MemoryMarshal, your preious changes are working ;)

@martincostello

Copy link
Copy Markdown
Member Author

Need to re-run the benchmarks with the latest changes.

@martincostello

Copy link
Copy Markdown
Member Author

Benchmark comparison: main (baseline) vs gh-3428 (candidate) (generated by Claude 🐙)

Benchmarks.Metrics.HistogramExplicitBoundsBenchmarks — BenchmarkDotNet v0.15.8, .NET 10.0.8 (X64 RyuJIT), 13th Gen Intel Core i7-13700H.

TL;DR

  • Allocations: unchanged. Every benchmark allocates 0 B on both branches — this change is allocation-neutral.
  • Special-case lookups (±Infinity, NaN) are dramatically faster. These now short-circuit instead of scanning the bounds, dropping from ~4–21 ns to well under 1 ns (often ~0).
  • General finite-value lookups are ~50% faster at larger bound counts. At BoundCount=49, ExactBoundary / InRangeValue / WithInfiniteBounds all roughly halve (~10.5 ns → ~5.1 ns).
  • At small bound counts (10) the general lookups are roughly neutral (within noise; one case ~5% slower).
  • ⚠️ BoundCount=50 and BoundCount=1000 comparisons for the finite-value lookups are unreliable — several baseline cells report sub-nanosecond means (e.g. 0.0007 ns, 0.0308 ns), which indicate the JIT eliminated the operation as dead code. The candidate measures real work in those cells, so the apparent "regression" there is a measurement artifact, not a true slowdown. I've excluded those from the conclusions above.

Representative results (.NET 10.0, Mean)

Cases below are where both branches measure real work, so the comparison is sound:

Scenario (BoundCount / Layout) Method Baseline Candidate Δ
49 / MixedSigned LookupExactBoundary 10.70 ns 5.14 ns −52%
49 / MixedSigned LookupInRangeValue 10.89 ns 5.03 ns −54%
49 / MixedSigned LookupWithInfiniteBounds 10.45 ns 5.12 ns −51%
49 / MixedSigned LookupPositiveInfinity 20.06 ns 0.54 ns −97%
49 / MixedSigned LookupNaN 15.67 ns 0.05 ns −100%
49 / PositiveOnly LookupExactBoundary 10.41 ns 4.98 ns −52%
49 / PositiveOnly LookupInRangeValue 10.34 ns 4.85 ns −53%
49 / PositiveOnly LookupPositiveInfinity 21.02 ns 0.46 ns −98%
49 / PositiveOnly LookupNaN 15.45 ns ~0 ns −100%
10 / MixedSigned LookupPositiveInfinity 7.84 ns 0.64 ns −92%
10 / MixedSigned LookupNaN 2.87 ns 0.02 ns −99%
10 / MixedSigned LookupExactBoundary 2.47 ns 2.18 ns −12%
10 / MixedSigned LookupWithInfiniteBounds 2.50 ns 2.26 ns −10%
10 / PositiveOnly LookupPositiveInfinity 4.48 ns 0.50 ns −89%
10 / PositiveOnly LookupNaN 3.04 ns ~0 ns −100%
10 / PositiveOnly LookupExactBoundary 2.48 ns 2.61 ns +5% (noise)
10 / PositiveOnly LookupInRangeValue 2.57 ns 2.41 ns −6%

The ±Infinity improvement also holds for the 50 and 1000 bound counts (e.g. LookupNegativeInfinity at 1000/MixedSigned: 3.29 ns → 0.18 ns; LookupPositiveInfinity: 4.13 ns → 0.48 ns) — these baseline cells were not dead-code-eliminated, so they're trustworthy.

Conclusion

A clear performance win with no allocation cost: infinity/NaN edge-case lookups are now effectively free, and the common finite-value lookups are ~50% faster once the bound array is large enough to matter. No genuine regressions were found — the only "slower" numbers come from baseline cells the JIT had optimized away, where the candidate is simply being measured doing real work.

Note: the .NET Framework 4.6.2 rows show the same trend and the same dead-code-elimination artifacts at BoundCount 50/1000.

This was referenced Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

keep-open Prevents issues and pull requests being closed as stale perf Performance related pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate possible histogram performance improvements

4 participants